


NONLINEAR FILTERS FOR HIDDEN MARKOV MODELS OF REGIME CHANGE 
WITH FAST MEAN-REVERTING STATES* 

ANDREW PAPANICOLAOUt 

Abstract. We consider filtering for a fiidden Marlcov model tliat evolves witfi multiple time scales in the hidden states. 
In particular, we consider the case where one of the states is a scaled Ornstein-Uhlenbeck process with fast reversion to a 
shifting-mean that is controlled by a continuous time Markov chain modeling regime change. We show that the nonlinear filter 
for such a process can be approximated by an averaged filter that asymptotically coincides with the true nonlinear filter of 
the regime-changing Markov chain as the rate of mean reversion approaches infinity. The asymptotics exploit weak converge 

Cn of the state variables to an invariant distribution, which is significantly different from the strong convergence used to obtain 

^— H asymptotic results in I19| . 

o 

►^^ 1. Introduction. When considering systems of stochastic differential equations with muhiple time 

^2 scales, hidden Markov models (HMMs) come up because we often want to include regime changes. In 

^^ estimating the hidden states, filtering methods are used to obtain posterior distributions. If fast mean- 

" ' reversion is involved, one might expect that the posterior distribution of the mean-reverting states is tempered 

only by an invariant measure and not by the history of the data. This is indeed correct, provided that 
r^, observations are sampled discretely, allowing for relaxation in the distribution of the fast mean-reverting 

Mh states by the time new observations arrive. Thus, it is reasonable to approximate the filtering distribution 

with an asymptotic average over these states. 

An asymptotic nonlinear filter can be useful because it may be of reduced dimension, and therefore 
1—^ considerably easier to compute. By taking a Bayesian approach to filtering, the posterior distributions are 

fvi computed by integrating exhaustively over all states. These integrations are prohibitively slow for systems 

>^ with hidden states of multiple dimension. In contrast, the computational demands for filters of linear systems 

^sj with Gaussian white noise are usually not an issue because we use Kalman filters. They are obtained from 

\r\ projections onto the linear subspace of observable data, while filtering in the presence of nonlincarity and/or 

r^ non-Gaussian components docs not allow this linear theory. However, it is possible to find faster nonlinear 

^T^ filtering algorithms when the HMM has states whose law quickly converges to an invariant measure. Such 

T-H filters are faster to compute because it becomes unnecessary to track the hidden states where convergence 

^ quickly occurs. 

^^ Asymptotic analysis of multi-scale stochastic models has been done extensively by Yin and Zhang in 

C^ [251 [5S], in which they also analyze the asymptotics of the Kalman filter and the Wonham filter. Multi-scale 

asymptotics are also of interest in stochastic volatility models [lOl [15] , which suggests that filtering is also 
useful in finance. In the setting of multiple time-scales (in particular HMMs with fast mean-reverting states), 
nonlinear filtering has, to our knowledge, not been explored to its full extent. The basic nonlinear filtering 
theory is presented by Rabiner [22 , Jazwinski [TB], and Yin and Zhang [21] as a fundamental Bayesian 
method to track the hidden states of nonlinear HMMs. Nonlinear filtering has been used extensively in 
target-tracking [3J |21], but in these types of problems it may be more efficient to generate particles for 
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the nonlinear components of the state space and then exploit the Kalman filter everywhere else, as is done 
in |14[ I27j where they propose the use of marginalized filters which achieve a Rao-Blackwell lower bound 
given the sufficient statistics provided by the Kalman filter. Essentially, nonlinear filtering of continuous 
state-space random processes needs to be numerically computed with some kind of approximating scheme. 
An important result in this area is that of Kushner |17) , which shows consistency and stability for Markov 
chain approximations of the nonlinear filter with continuous time observations. In general, particle filters 
[TJ [H] are a method used frequently for approximating the nonlinear filter, but the amount of time required 
to generate the necessary Monte Carlo samples can be a deterrent in practice. 

In this paper, we consider in detail a nonlinear multi-scale model with a mean-reverting state that 
reverts faster as a parameter e > gets smaller. The model has applications in target-tracking in finance 
and elsewhere. In the early sections we discuss how a distributional averaging occurs in the posterior as 
e \, if the data is sampled discretely, and we also give some examples to illustrate some possible variations 
of the model. We then switch to a rigorous analysis and provide a detailed proof showing that indeed an 
averaged filter is obtained as e \, 0, provided that appropriate assumptions are made when defining the 
model. We also present some numerical experiments to illustrate how such average filters perform compared 
to other approximations of the nonlinear filter. 

While the theory presented is for a specific model, the proofs can be carried out for other, more general 
process of a similar nature. This paper will focus on models where the fast mean-reverting state is a 
one-dimensional Ornstein-Uhlenbeck process, but the theory can be reworked for other diffusion such as 
Cox-Ingersol-Ross processes. Also, the theory is easily generalized to the multivariate setting, both for cases 
when observations are a vector and when the OU is a vector. In fact, a model with a multivariate OU 
process and a scalar-valued regime process can be well-managed by our filter of reduced-dimensionality if 
the asymptotic approximations apply. Parameter estimation is also an issue of interest, but one which we do 
not cover in this paper. The expectation maximization (EM) algorithm is implementable for the asymptotic 
model and will use the asymptotic filter that is computed in this paper, however the asymptotics of EM are 
perhaps a topic for future research. 

This paper is organized as follows: in Section [2] we present the HMM which is considered in this paper; 
the associated nonlinear filter is written in Section [2?2| the asymptotically averaged filter is given in Section 
|2.3[ some example applications are given in Section |2.4[ and Section |2.5| describes some particular effects 
that scaling and observation sampling can have on the asymptotic filter. In Section [3] we restate the model 
and the problem in a mathematically rigorous manner, and then goes on to a proof, while relegating to 
Appendix \K\ some of the details that are included for the sake of completeness. Finally, in Section [4] we give 
a detailed account of a numerical simulation to illustrate how the average filter performs relative to some of 
the parameter choices. 

2. Formulation. In this section we introduce the model that is the focus of the paper. The notation of 
sections 2.1 and 2.2 is used throughout the paper. In Section 2.3 we discuss the averaged filter that occurs as 
the rate of mean reversion approaches infinity, in |2.4| we give some examples, and in |2.5| we consider changes 
in the averaged filter when using a somewhat different scaling. 



2.1. The Process. Given the parameter e > 0, which denotes the mean reversion time, we let f > 
denote time and we consider a hidden Markov model (HMM) consisting of the following processes: 

O'^(t) = a hidden Markov chain in a finite state space 

X'^{t) = a hidden diffusion with mean reverting properties 

Y'^{t) = an observed diffusion process. 

The hidden pair {X'^{t), 9^(i)) € M x 5 is a Markov process with X'^{t) being a real- valued process and 8'^(t) 
taking value in the finite discrete space S = {si, . . . ,sm}- The forward Kolmogorov equation for 0'^(t) is 
written in terms of a transition intensity matrix Q E ^MxM ^]^q^ jg ^ function of X'^{t), 

^p(e'(i) = Si) = J2 QJ^{x'{t))P{e'{t) - s,) , t > (2.1) 

j 

P(e^(0)-5,)-po(s.) (2.2) 

for all i < M. We assume that jumps in 0'^{t) cannot occur at an infinite rate, and that there are no 
cemetery states. Therefore there are constants /? and a such that 

sup{-Qu{x)) < 13 <oo (2.3) 

X 

mi{-Qu{x)) > a > (2.4) 

X 

for alH < M and all a; G K. The process X'^{t) is a type of Ornstein-Uhlenbeck (OU) with G'^(i) serving as 
a regime changing shifting mean, 

dX%t) = ^ (e'(t) - X'{t))dt + ^dW{t) (2.5) 

where W{t) is an independent Wiener process and the law of ^'^(0) is given. We say that '1/e' is the rate of 
mean-reversion for X'^{t). This model is a regime-switching OU, and as e \ there is fast relaxation to the 
local equilibrium. The generator of (0'^(t), X'(i)) is -C + Q{x)^ acting on smooth functions of a; e M and 
Si e S, where 



C 



/A ... \ 
£2 ... 



V ... Cm ) 



and 



' ^^ 'dx^2dx^- 
For any x,x' G M, i,j < M and any i > 0, we let Aj denote the transition density function, which is the 
solution of the Fokker-Planck equation 



— -C* j Al{x, s^\x', Sj) = 2^ Qh{x)AI{x, se\x', Sj), 



AQ(a;, Si\x', Sj) = S{x ~ x')5ij, 
where C* is the adjoint of £. In particular, the transition probabihty over a time-step At is given by A^^, 

A'.Jx, sAx', s,-) = -^nxUt + At) < X, e'(t + At) = sAX'U) = x', Q'U) = sA. 
ox 

Samples of this type of OU process with Q{x) = Q (i.e. Q is a constant function of x) are shown in figure 
2.1 where we that see the stationary behavior of X'^{t) around 0{t) occurs more and more rapidly as e 
decreases. 



0-U Reversion and Relaxation Beiiaviorfor Various e 




Fig. 2.1. An example of the proposed OU-type process with Q{x) = Q (i.e. Q is a constant indendent of x). In 
this example we see the stationary behavior of X'^{t) around Q{t) occur faster and faster as e decreases. 



The observation process Y'^{t) depends on the hidden states since it is given by a noisy and nonlinear 
function of the hidden state(s). The continuous time dynamics of V^lt) are described by the following SDE: 

dY'{t) = h{X''{t))dt + dZ{t) 

where Z{t) is another independent Wiener process, and h{-) is some continuous and bounded function. 
However, the process Y'^(t) is only discretely observed, and so for some time-step At there are time 
points tk = kAt for k = 1,2, 3, 4, .... such that the observations are given by a discrete sequence Y^ = Y'^{tk), 
with all the observations up to time tk denoted by Y^./^. The process Y^ is given incrementally as 

fik + l 



^k+1 — "k 



h{X'{T))dT + AZk 



where Yj^^^ is the observation at time t^+i and AZ/, = Z(tk+i) — Z{tk). The standard filtering problem for 
an HMM of this sort is to find an estimate of the hidden states {Q^{tk),X'^{tk)) which is adapted to the 
filtration Tl ~ '^{Yi-k)} where cr(Fj^.j,) denotes the cr-algebra generated by the observations. 

4 



2.2. The Standard Nonlinear Filter. For diffusion processes, nonlinear filtering is implemented 
either through numerical integration or through Monte Carlo simulation (such as particle filters; see [UIH]). 
In most of the applied literature, it is standard to write a filtering algorithm as a recursive formula for 
the posterior distribution of the hidden states as new data arrives, and many filtering recursions (linear as 
well as nonlinear) can be derived by an application of Bayes rule. When applying Bayes rule, the essential 
assumptions are the Markov property and incremental independence of the noise in the observations, both of 
which are specified apriori in the model. These two conditions comprise what is referred to as the memoryless 
channel [13j . which allows the joint transition probabilities to be written as 

= P(y^+l|y,^X^(ife+l) = x,X'{tk) = x') X AX,(x,s,|x',s,). 

Given the memoryless channel, a simple application of Bayes rule yields the recursive formulas for filtering 
in discrete time. The formulas for filtering in continuous time can be heuristically derived by passing the 
limit in At in the discrete filter. In this paper the primary focus is on discretely observed processes, making it 
straight-forward to derive all the filtering recursions and they will always come from the Bayesian recursion. 
In discrete time, the observed data is collected and retained as a sequence of real numbers from which a 
filtering distribution is constructed. At time tfc, the observed data is retained in a sequence {j/i, • • • yk\ that 
can be interpreted as the event {Y^.j, — yi-k\. For any pair Si £ S and x e M, the filtering distribution at 
time tk is then a function of yi-.k, 

T^lix, s,) = ^nX^ih) < X, e^(tfe) = s,\Yl,^ = y,.,k). 

The fundamentals of Bayesian filtering are presented in the earlier works by Kushner |17) . Jazwinski 
[IB] , and Baum et al ^ jSj. The Bayesian approach has since been used in many different applications. 
Some more recent references include Asmussen and Glynn [1], Cappe, Moulines and Ryden |8], Krylov et 
al |13| . and Rabiner |22] . Bayesian filtering is usually done with a recursive formula, which we refer to as 
the Forward Baum-Welch equation. For any x^x' G M, i,j < M and any t > 0, the Forward Baum- Welch 
equation is 

T^k+iix.Si) = V / P(yfe+i|yfe,X'(tfc+i) = x,X'{tk) =x')K%^{x,s,\x',Sj)^Tl{x',Sj)dx' (2.6) 

where Ck+i is a normalizing constant, P(j/A;+i|2/fcj^'^(^fc+i) ~ x,X'^{tk) = x') is the likelihood function of 
yk+i given {yk, x, x'), and 7ro(a:, Si) is given independent of e. 

The filtering recursion in ( |2.6[ ) is difficult both to compute and to analyze because of the path-dependence 
in the likelihood function, 

nyk+i\yk.x'{tk+i) = x,x'{tk) = x') 

= E [¥{yk+i\yk,X'{T) Vr e [tk,tk+i])\X%tk+i) = x,X%tk) = x'] 
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= C-E 



cxp 



1 fyk+i-yk-j''^'hiX^{T))dT' 



'M 



X%tk+i)=x,X%tk)=x' 



where C is a normalizing constant (dependent on Ai). This path-dependence can be avoided by adding 
to the HMM an extra process that allows the filter to be written with a point-wise-dependent likelihood 
function. Denote this process by V^lt) and let it satisfy the differential equation 

dVit) = h{X\t))dt. 

With the addition of V^ (t) the observations no longer depends on a path since 

Yi+, - Y^ + Vmk+i) - V'{tk) + AZfe 

We call the triplet {V''{t),X''{t), e^(t)) e M x M x 5, which is a Markov process, the augmented HMM with 
generator \C + Q{x) + h{x)j^. For any {v, x,Si) eRxRx S and («', x' , Sj) eRxRx S, we let Tj denote 
the Green's function of the Fokker-Planck equation given by the generator of the augmented HMM, 

9 1 d \ 

— C* - h{x)--- j Tl[v,x,si\v' ,x',Sj) ^^Qii{x)Tl{v,x,Si\v' ,x' ,Sj), 



with Tq{v, X, S(\v' , x' , Sj) = S{v — v')S{x — x')Sij. 
The augmented filter is then 

92 



ttUv,x,s,) = Q^nV'itk) < v,X'{tk) < x,Q'itk) = s,\Ylk = yi.^) 



for which the forward Baum- Welch equation is 



nl+.iv, x, s.) = ^ ^ J Jpivk+ilvk, V%tk+i) - V, V'itk) = v' 



X r^j(i;, X, Si\v' ,x' , Sj)'k1.(v' ,x' , Sj)dx'dv' 



(2.7) 



where Ck+i is another normalizing constant. The advantage of the augmented filter is that the likelihood 
function is now a Gaussian, 



1 fvk+i -Vk- {v-v') 



nVk+AVk, V'{tk+i) = V, V'{tk) = v')=C- exp . 



2 V VAt 

where C is a normalizing constant (dependent on Ai) . It can be readily verified that the marginal posterior 



density J Trl{v,x, Si)dv satisfies the Baum- Welch recursion (2.6) with the correct path-dependent likelihood 
function. 

2.3. Averaged Filters. The rate at which X^{t) reverts to an approximately stationary distribution 
is 1/e, and as e \ we would expect there to be some stationary behavior in the filter provided that the 



condition in (2.3) holds. Indeed, this is the case and we give a direct proof in later sections, but for now we 



want to provide some intuition about the filter in the limit, and consider some applications. 
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For any i,j < M, let /ii(a;) be the invariant density such that C*fii{x) ~ 0, which can be written as 

1 



^ii{x) 



-(x-s^Y 



and integrates to one, J ^i{x)dx — 1, making it a probabihty density. Let Q and h = (^sji be the averages 
under the invariant measure, 



/ Qij{x)fii{x)dj 



and 



, = / h{x)fii 



hs, = 



x)dx for all i < M. 



Then for a fixed T < oo, the process {V^ (t) , Q'^ (t)) E Rx S, t E [0,T] converges weakly to the process 
(F(t),e(t)) e M X 5, t e [0,r] as e \ 00 The generator of the limit proces (F(t),e(t)) is the sum of the 
averaged operands Q + h-M^, from which it is clear that {V{t), 8(t)) is a Markov process. In particular, 8(i) 
is a continuous-time Markov chain with transition intensities Q, and 

dV{t) = h^^^^dt. 

Therefore, V{t) is a deterministic function of the path of 0(t), and so the limiting nonlinear filter is of 
significantly reduced dimension and is related directly to the regime-determining path. 

Based on this result we expect the limiting nonlinear filter to be a marginal posterior probability function 
Tffc satisfying the reduced recursion 



1 



T^k+liSi 



Cfc+1 



^^Jtj{yk+i\yk){e'- 



'*At\ - 



7r/c(sj) 



(2.8) 



where Q* denotes matrix transpose of Q, Ck+i is a normalizing constant, and "0 is the likelihood function 
that is computed by integrating over all possible paths. 



'>Ptj{yk+i\yk) =E 



exp ■ 



1 ( Vk+i - yk - Jtl:^' h^f^^^dr' 



/Ai 



0{tk+i) = Si,Q{tk) = Sj 



The averaged filter in (2.8) is of significantly reduced complexity compared to the filter in (2.6) or the 



augmented filter of (2.7). There is path-dependence in the likelihood function, but this is a relatively minor 
complication because this dependence is on the path of a finite-state Markov chain over a relatively short 
interval. In contrast, the extra two dimensional state space that we need to deal with in order to filter V'^{tk) 
and X'^{tk) adds to the complexity of the filter. Therefore, provided that e and Ai are chosen appropriately, 
it is faster and almost as effective to use the average filter to compute the posterior of <d'^{th). Later sections 
include discussion on regimes for e and At that make it appropriate to use this asymptotic approximation. 

2.4. Some Examples. Ornstein-Uhlenbeck-type processes with fast mean reversion are used in a va- 
riety of applications. Examples come from target tracking, biology, imaging, finance etc. 



^This result is proven here in Sectionlsl the proof follows well-known general methods ([28j chapter 7) for weak convergence 
of processes. 



2.4.1. Target-Tracking Models. Regime-switching models and nonlinear filtering are applied to 
target-tracking by Bar-Shalom [3] and also by Rozovsky |23j . Suppose that a computerized-tracking system 
is following an object with measurements made by a video camera. For such a problem we interpret the 
variables as follows: 

6'(i) = a hidden maneuver to be identified, 

X' {t) = tracked velocity 

where X'^{t) rapidly relaxes to &'^{t) but in the presence of significant noise. Observations of the target are 
noisy measurements of the target's position, 

n%i =Y^+ f '^' h{X'{T))dT + AZfc 

We introduce an auxiliary field P'^(t) to denote position, and a change in position of the target is P'^{tk+i) — 
P'^{tk) = /f ''^^ h{X'^{T))dT. Then the discrete-time observation model becomes 

YUi - n = P'{tk+i) - PHh) + AZk 
which can be interpreted as relative changes in measured position being equal to the relative change in the 



actual position plus noise. The filter for such a model is given by (2.7), and the average filter of ( |2.8[ ) can 
be applied as e \ 0. Ultimately, if the computerized tracking system can positively identify Q'^{tk), it will 
be possible to have information about the sequence of maneuvers made by the target. 

The potential for an averaged filter's success in these video-tracking problems depends on the user's 
ability to choose At such that 

mm 77-—^ > A<> e. 
i,x Qu\x) 

Provided that Q and e are of different scales, the operator of the video-surveillance system should select a 
frame-rate for the camera that is both effective for tracking (ie. mini^„(— 1/Qij(a;)) ^ At), and consistent 
with the fast mean-reverting hypothesis (i.e. At ;:^ e). 

2.4.2. Ornstein-Uhlebeck process for Stochastic Volatility. Multi-scale models for stochastic 
volatility have been considered extensively by Fouque, Papanicolaou and Sircar [10] and also by Howison 
[T5] . In this example we introduce filtering to stochastic volatility models of the form 

F'^(t) ~ Observed log-price 

8'^(t) = stochastic volatility regime 

\/h{X'^{t)) — stochastic volatility 



Here X'^{t) is the Ornstein-Uhlenbeck process defined by (2.5 1, the function h{x) is taken to be continuous. 



positive, bounded and monotone, and it is assumed that there is a constant J > and H < oo such that 

0<S < h{x) <H<oo Vx e M. 



For example, the exponential OU model of Perello, Sircar and Masoliver [5T] can be modified to fit this 
framework. 



Letting {X'^{t), 6^(t)) satisfy (2.1 1 and (2.5), we take the logarithm of a stock price to satisfy the following 
SDE: 

dY'{t) = (,' - h{X'{t))] dt + VHXHt)) (^^pdW{t) + v/1 - ep^dZ{t)^ 

where r is a known parameter (the interest rate), Z{t) is a standard Brownian motion independent of W(t), 
and p e (—1,0) models the volatility leverage effect provided that h'{x) > for all x, 

dY'^ir) ■ dh{X^{T)) = p j h'{X'{T))^h{X'{T))di 

Jt) 

Note that W{t) — y/epW{t) + \J\ — ep'^Z(t) is a standard Brownian correlated with -^W, with correlation 
coefficient p. Since the log price ^^{t) is observed discretely (e.g. daily), we write the observation model as 
follows: 

YUi - Yi = ^^t-l f '^' h{X^iT))dT + ^ep j '^' ^h{XHT))dW{T) + ^l - ep^ j '^' ^h{X^{T))dZ{T) 
^ Jtk Jtk Jtk 

=d tM - i y*'^' h{X'{T))dT + V~epp^' ^hiX^{T))dWiT) + J (1 - ep2) y*'+' hiX^(r))dT ■ Zf, 

where the equality is in distribution and Z^ ~ iidN{0, 1). 

As before, we introduce two auxiliary fields, namely (V^'(i), C'^(t)), where 

dV'{t) = {l-ep'^)h{X'{t))dt and dC{t) = V^p^yh{X^{t))dW{t) 

with initial conditions ^=(0) == = C'(0), for which we see that (T/'(-), e'(-)) ^ (F(-),e(-)) as e \ 0, and 
C,'^{t) -> a.s. as e \ for all t < oo. Applying these limits, we have the following weak limit of the 
observations. 



nVi - Y^ ^pAt^^J^ '^' /ie(.)dT + jy 



*e(T) 



and by the same token that shows V'^{-) converges, we have the following averaging of the leverage effect, 

'^' dY'ir) ■ dh{X%T)) ^ p j '^\h'Vh)^,.dT 



e(r) 
tfe Jtk 



where {h'y/h)si — J h'[x)\/h{x)pi{x)dx, both limits taken as e \, 0. 

The standard nonlinear filter for this problem is more complicated than the one given in equation (2.71, 
but we will see later on that a simple generalization of theorem |3.1| of Section [3] leads to an averaged filter 
as e \ 0, 



7rfc+i(s,) = y^-ilj,j{yk+i\yk) (e'^'^M 7rfe(sj) 

Ck + l ^ ' ij 



■^Wc write "/„ dY'^{T) ■ dh{X'^ (t))" to denote the quadratic cross-variation Y'{t) and h{X'^(T)). 
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where the Hkehhood function has the form 



AjiVk+ilVk) =E 



exp 



yk+i— tffc-MAt+j/t' 



h-si^^dr 



/;: + ' ''o(.)dr 



i:rT^eir)dr 



e(tfe+i)-.s„e(tfe) = s, 



with hs- — J h{x)iii{x)dx = -y= J h{x)e~^^~^^^ dx. We wiU revisit this example in Section 3 after we 
have gone over the rigorous proof of the averaged filter, at which point we will be able to give a detailed 
explanation as to why this stochastic volatility filter has the asymptotic averaging that we show here. From 
the point of view stochastic volatility modeling, we see that nonlinear filters for tracking regime changes in 
the volatility can be significantly simplified if we have fast mean reversion and if the observation interval is 
chosen appropriately. 

The potential for the averaged filter to be effective in tracking the state of volatility will depend on the 
data. Time scales in financial data have been observed, with fast mean-reversion happening on the order 
of 1-2 days (or less), and slow mean-reversion happening on the order of weeks and months. Therefore, a 
choice of At corresponding to a few days or a week would allow the averaged filter to track regime-changes 
effectively. It should also be mentioned that there are time-dependent effects in financial data (e.g. daily 
intra-day trading patterns) that should be included in a stochastic volatility model, but the averaged filter 
has the ability to track a time-independent regime-process without estimating/tracking the time-dependent 
components (see [TT]). 

2.5. Issues with e and At. There are some subtle issues in fast mean reversion models and filtering 
that can be easily overlooked. In this subsection we highlight some of them that can arise when seemingly 
small changes are made to the proposed model. They are from a general class of fast mean reverting models 
that may look similar to the model in this paper, but are in fact quite different. In particular their limiting 
filters are not the same. 

2.5.1. Allowing for Relaxation Between Observations. For a fixed e > 0, the choice to implement 



the filter in (2.8) is appropriate provided that the sampling rate of observations. At, is chosen correctly. If 



At is too small (i.e. At << e) then there will not be enough time between observations for the diffusion 
distribution of X'^ to relax. If the average filter is applied in such a situation, there will be estimation error 
that is not present when the optimal nonlinear filter is used. Therefore, it is important to impose a condition 



such as At > e to decide if it is appropriate to use the average filter in (2.8). In other words, such a condition 



ensures that the average time to mean reversion, e, in the diffusion process is faster than the rate at which 
new observations arrive. The interval between observations. At, must also not be too large because then 
the filter will be neglectful of regime changes. In applications where there is regime change and fast mean 
reversion, and the time scales for these two effects are well separated, the observation time should be chosen 
to be intermediate between the two, 

4 



e < At < min 
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Qii{x)' 



(2.9) 



The above discussion suggests that fast mean-reversion asymptotics cannot be apphed to the Zakai 
equation |25| , 130 ) . which gives the continuous time fiher. This is because averaged fihers only work when 
there is some coarse-graining of the observations (for example, by discretizing the observation time) and 
when the rate of mean reversion is faster than the observation sampling rate. 

2.5.2. Different Scalings. In this section we present an example to illustrate some subtleties in the 
scaling of stochastic differential equations. The example is the focus of the paper by Papanicolaou [12] in 
which there is a limiting filter of reduced dimension, but the type of averaging that is used to obtain it is 



different and thus the limiting filter is considerably different from that in (2 
We consider the following video-tracking model: 

6(f) = target velocity 

X'^(t) = camera velocity 

P'(i) = position error 

Yf^ = noisy measurement of position error 

Suppose the dynamics of 6(t) have intensity matrix Q which no longer depends on X'^(t), and the diffusion 
is no longer scaled by l/-v/e, 

|p(e(i)-sO-^Q,,p(e(i) = s,) 

3 

dx'{t) = ^ (e(t) - x'{t))dt + dw(t) 

e 

dP'{t)^ ^{eit)-x'{t))dt 

e 

Y^ = h{P'{tk)) + Zk 

where Zk ^ N{0, 1). The main differences between the model here and in the rest of the paper is the lack of 
a scaling factor before the dWt term and the addition of a scaling term in the dynamics of the auxiliary field 
(in this case we have P*^ as the auxiliary field). In [19^, the limit theory used to obtain the average filter for 
this model is simpler than the theory presented in the rest of this paper, but the asymptotic average of the 
filter is a of an entirely different nature. 
The filter for this model is 

and satisfies the following forward Baum- Welch equation: 

4+i(p,a;,s.) = ^e-sfe-fi-'^Cp))^ V / f A'^^M^,■s^\p' ,^' ,s,)7:lip' ,x' ,.s,)dx'dp' 
Cfc+i j •' •' 

where A^j( • ) is the joint transition kernel of the hidden process. By a generalization of the Kramers- 
Smoluchowski theorem [13 [H] , we have 

X'^{t) -> Q{t) a.s. pointwise in time, 

11 



from which it follows that 

P''{tk+i) - P^itk) ^ 0(ifc+i) - 0(ifc) - AVFfe a.s. pointwise in time. 
These limits allow us to obtain the following averaged filter in the limit 



Cfc+l ^-^ J ^ ^ ij 

This filter is of reduced dimension, but it is not entirely reduced to a filter for the state Q{tk). In fact, 
such a fully reduced filter is not possible because the likelihood function has not been averaged. Therefore, 
although the theory used to obtain the limit is simpler and the convergence stronger, the limiting filter is 



nevertheless more complicated than the averaged filter of (2.8) 



3. Rigorous Averaging For Filter. In this section we reintroduce the model in a probabilistic setting. 



which will allow us to rigorously prove the limit theorem, giving us the average filter of (2.8). The statements 
and proofs that build the theory for the averaged filter use pathwise weak convergence and test functions, 
and so it is necessary to redefine all the dynamics of the processes in their weak or integrated form. In 
particular, convergence of the solutions to a martingale problem is the essential tool in showing that the 
processes {Q^{t), V'^{t)) ,0 < t <T, converge path- wise in the weak sense. 

3.1. Preparation for Theorem. Consider the Markov process {Q'^{t),X'^{t),V'^{t)) for e > and 
t G [0,T], on the probability space {fl, (J-f)t<T,P). Let 0'^(i) be a jump process taking value in the space of 
finite values S = {si, , sm}- The process X'^{t) is an Ornstein-Uhlenbeck (OU) process with a shifting- 
mean, 

dX'(t) = ^ (e"(t) - XHt)) dt + ^dWit) (3.1) 

where W{t) is an independent, standard Brownian motion and e > is an arbitrarily small parameter. From 
this, the third process is defined as 

V'{t) = Vo+ f h{X\r))dT 

JQ 

where /i : M — > M is bounded and continuous. 
The space ft can be defined as 

n = D {[0,T];S) X C {[Q,T];m.'^) 

were D ([0,T];S) can be taken to be here the set of right-hand continuous piece-wise constant functions in 
S, C ([0,r];M^) are the continuous functions in M^, and J-t, <t <T is the associated filtration. There is 
a matrix operator Q where 

Qix) = (Q,,(x)) e M^■^^*^ 

which is a transition rate matrix for all x € M such that for any bounded function g{9) : S -^ R, 

Qix)g{d) = V'Qjj(a;)g(sj), 

6— Si 

J 
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We assume the conditions set forth in (2.31 and (2.4) of Section [2] hold (with constants < a < (3 < oo) so 
that 0'^{t) communicates on all possible states of 5, that jumps cannot occur at an infinite rate, and that 
there are no cemetery states. We also take the generators Ci to be the same as they were in section (l2|, 
and we take the matrix of these operators C to be the same as well. Then for any scalar-valued function 
g{s,x,v) with compact support, two derivatives in x and one in v (both bounded), the dynamics of the 
Markov process {Q^{t),X'^{t),V'^{t)) are determined by the martingale. 



E 



g{Q^{t),X^{t),V^{t))- 



-C + Q{X^{t)) + KX\t))— ) g(e^(r),X^(T), V'{T))dT 



T, 



g{Q^{s),X\s),V^{s)) 



\C + Q{X\r)) + h{X^{r))^^ ) .g(e^(r), X^(r), V%T))dr (3.2) 



holding for any test function g and < s <t <T. Equation (3.2) is a weak form of the forward equation 
for the Markov process {Q''{t),X^{t),V%t)). 

Observations on (8'(i),X^(t), V^(t)) are given at the discrete times tfe = fcAi by the process Y^^ 

where AZ^ ~ iidN{0, At). 

3.2. Theorem for Averaged Filter. We consider a sequence yi;k = {yi, ■ ■ ■ ,yk} to be the observed 
realization of Yj^.^., for which we have the following asymptotic theory for the filter. 

Theorem 3.1. Assume l[2.3\ ) is true. Then for any bounded function g : 5 — > M, point-wise for any 
Vi:k — {2/1, 2/2, ■•■, J/fc} G M'^ the optimal filter has the following limit as e \, 0; 



E 



5(e^(tfe)) 



Ylk = yv.k 



^g{si)Trk{si) ase\0 



whe 



with 



7rfc+i(si) = 3 y^^-^ijiyk+ilVk) (e'^ ^*) T^kisj 

Ck + l ^ ' ij 



i^ijivk+ilyk) =E 



exp ■ 



2At 



2/fe+i - yk 



he(r)dT 



6(ifc+i) = Si,0(ifc) = Sj 



and Cfe+i being a normalizing constant. 



One main component of the proof of theorem 3.1 is the path- wise weak limit of (O'^(-), ^'^(0) 



(3.3) 



(e^(.),F^(-))^(e(-),n-)) (3.4) 

where (8(-),F(-)) is a Markov process with generator Q + diag(h)-^. Appendix A goes through the steps 



proving that the limit in (3.4) holds given the setup of Section 3.1 



The other component of the proof of theorem 3.1 is the use of the unnormalized posterior. The kernel 
used to obtain the unnormalized posterior is a likelihood function, and so in spirit it is similar to the change 
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of measure used in the Zakai equation (in the case of continuous-time observations) as was done by Zakai 
[30] . and also is also a part of Rozovsky's proof of the uniqueness of solutions to the Zakai equation [25] . 

Without loss of generality, let At = 1. For any vector (j/i, ..., j/fe) e M*"' define the 



Proof of Theorem 



3.1 



likelihood function oi V^{-), 



fe-i . ^ 

Mk{yi:k;V%-))^l[expl--iy,+, 



■m-iv%u+,)~v^{u))y 



The posterior expectation can be written as a function of the observed data in terms of A^^, 

E[Mk{yi:k;V'{-))9{e%tt))] 



E 



gi&'iu)) 



^i-.k = yi.k 



E[Mkiyi:k;V^{-))] 

We can now apply the weak convergence theorem to the ratio on the right. From the weak limit of 



(0'^(-), V''^(-)) in (3.4| this is equal to a function of yi± which converges point-wise, 



E[Mk{yi:k;V'{-))9{e'{tk))] E[Mk{yi:k;V{-))9{e{tk))] 
E[Mk{yi:k;V^i-))] E[M,iyi..k;V{-))] 



as e \ 0. Letting Yk denote the process 



tk 



it is easily seen that A4k{yi:k, V{')) is the likelihood function associated with the filtering problem as e \ 0. 
Therefore, the limit ratio can be identified with a conditional expectation and we have 



E g{e'{tk)) Ylk = yi.,k ^ E .9(e(ife)) Yi-.k = yi-.k 



as e \ pointwise for yi;k- 

In connection with the Baum- Welch equation, the posterior expectation is written as 



E 



g(e(tfc)) Yi.,k = yi:k\ = ^9isi)Trk{si) 



where the posterior distribution tt^ is given recursively as 



^k{s^) = - V -j^nYk < y\Yk-i - yk-Mtk) = s., e(tfc_i) = s,) (eQ'^*") TTk-ii 



Ck ^-^ dy 



Ck 



^■^|^J^{yk\yk-l) ie^'^*") 7rfc_i(sj) 



This completes the proof of the theorem. 
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3.3. Stochastic Volatility Example (Revisited). Wc now provide some details to show why theo- 
rem |3.1| applies to the stochastic volatility model of Section |2.4.2| Recall that the example had two hidden 

states 

8' (t) = the hidden volatility regime 
X'^{t) = the hidden volatility process 

with two auxiliary fields 

dV'it) = (f - ep^)h{X'{t))dt and dC{t) = ^fep^Jh{X^{t))dW{t) 

where V'^{0) = = C'^(O), and observations were given by 



where Zk ~ iidN(0, 1). It was also mentioned in Section 2.4.2 that path-wise convergence holds for Q'^{-) 
and the auxiliary fields, 

(r(.),F^(-),e^(.))^(o,F(.),e(.)) 

as e \, 0. 



Now, in order to apply the steps in the proof of theorem 3.1 the likelihood function is defined as 

I 1 f Vi^T-Vi-uAt+i(V'(tiMT)-V'(ti))-C'(ti^T)-C'(ti)'^ 

k-1 exp ■ 



y/iV-{U + i)-V'iU))At 



4 = 1 



{V^{U+i)-V^iU))At 



and because < 6 < h{x) < H < cc, we have A4k{yi:k', V^{'), C'(')) i^ bounded for all tk < oo. Then, using 
the weak convergence we have that for any bounded function 5 : 5 — > M 



E[gie'itk))\Yl,, - 2/1:,] ^ ^5(s,)7f,(s,) 



as e \ 0, where tt, is given by theorem 3.1 but with 



'^ijiVi+ilVi) =E 



exp 



1 I yi + i— yi-pAt+i /t*+^ /ie(^)dT 



^J{i:r ^e(.,dr) 



At 



y (//;*' ^e(r)'^^) 



At 



for alH < fc — 1 . 

One disadvantage to this averaged filter is that p does not appear. Leverage is present for all e > 0, but 
depends on the path of X' . Therefore, the averaged filter is useful only in settings where volatility's regime 
appears independent of returns. For instance, suppose option prices have independent short-term dynamics, 
but the common level of volatility remains constant for extended periods and does not have any short-term 
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effects on returns. If leverage is a considerable concern, it can make in an impact on the averaged filter if the 
model includes a slowly-scaled OU which depends on 8 (see Fouque, Papanicolaou and Sircar [IQj). Another 
option is to add jumps in a manner that is similar Bakshi, Cao and Chen [5], 

dY'{t) = ( M - lh{X'{t))j dt + y/h{X'^it))dW{t) - J{t)de'it) - vdt 

where dQ^it) — 0'(t) — 8'^(t^), J{t) is a random variable that depends on the sign of dQ'^{t), and i/ is a 
compensator. 

4. Numerical Simulations. In this section we carry out some numerical simulations of the proposed 
model and implement the filtering algorithms. Methods for simulating the HMM and computing the filter 
are described in detail, and simulated results verify some basic properties of the averaged filter. In particular. 



Sections 4.1 and 4.2 describes the numerical methods, and Sections 4.3 and 4.4 explore how the filter performs 
for various e and Ai. 

In estimating the hidden states, it is optimal to use the filter from Section [2?2l referred to as the 'optimal 
filter', but Theorem |3 . 1 1 proved that the averaged filter is equivalent as e \ 0. Section [473| will focus on how 
the averaged filter compares to the optimal filter, and for this reason the simulations use a simplified linear 
model under which it is easier to compute the optimal filter. The main conclusions are 

1. that the averaged filter is indeed optimal as e \ 0, 

2. that Ai needs to be chosen within a certain range depending on the mean time of reversion and the 



expected switching times of the hidden Markov chain, as in (2.9). 

4.1. Generating the Simulation Data. The simulation occurs over a time interval [0,T], and given 
time-step At we will have observations Y^ occurring at a set of equidistant time-points {tk}^^Q 

= to <ti <t2 < .... <tN =T 
with tk+i — tk + At. Given the linear observation function h{x) — h ■ x, the observations are given simply 

by 

Y^^^ ^Y^ + h- f "^^ X'{s)ds + AZk 

Jtu 

where AZ^ — Z{tk+i) — Z{tk)- The task of simulating consists of two parts: (1) how to generate the hidden 
states, and (2) how to generate the observations. 

We'll assume the dynamics of 9(t) are homogeneous (i.e. they are not affected by X'^{t)\ Q does not 
have an 'x' argument). To simulate the processes, we need to selected a finer time-step, which we will call 
At. For some positive integer m, we define At as 

At = — At, for some ra G Z^. 

m 

Thus, the observations are m-many At's apart, and the simulated state processes are generated at the 
m-many points in between, 

te+i=tt + At for ^ = 0,1,2,..., m- TV- 1. 
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At each time ti, a discrete-time Markov process (0^,X|) is sampled sequentially according to Algorithmlll 

Algorithm 1 Sequential Sampling of State-Processes 

1. set a — exp(— Ai/e). 

2. set the matrix p = [pij] where pij = < exp{AtQ*} > 

3. Sequentially sample the states: 
for £ = ^- m • Af - 1 do 

a. 0£-f 1 -i— where 6 is sampled from p conditional on O^, 

b. A|_^i ^ aA| + (1 - a)ee + yJ^We where We ^ iidN{Q, 1). 
end for 

Given the discrete process from Algorithmlll we approximate (6(-),A^(-)) with a piece-wise constant 
process (8(-),X'^(-)), defined as follows: 



m-N-l 
£=0 



The process in (4.1) converges path-wise in the weak sense to (8(-), A'^(-)) as m — > oo (the proof of this 



convergence is similar to the convergence proofs in Appendix A. 2 and is also covered in [9]). 



Finally, we approximate the observations with Riemann sums, 

m(fe+l) 

Y^^^=Y^^ + Ath- Y, XI + AZk for fc = 0,l,2,...,A-l. (4.2) 

e=mk+l 

Indeed, it can be further proved that Y^ => Y^ as m ^ cxj, and so our simulations will assume that y^ = y^ 
and choose m so that the Riemann sum is close to an integral. 



4.2. Computing the Filter. Suppose that we have used Algorithm II| and (4.2) to generate the ob- 



served the process F/.^. — yi-.k- Algorithm IT] is used again to generate samples for a particle filter (see [UIH]) 
or a Rao-Blackwellized filter (see [HlllTj), 

{^("\A,^'(")}r=^ forn = l,2,3,...,i? 

where R is the number of samples generated. The particle filter consists of a set of weights {w^." }„_fc such 
that for R large we have the following approximation: 

R 



E 



g{X\tu)Mtu))Yl,^y,..u\^Y.9i<d^rL^xZ'^)-4'^ for fc = 0, 1, 2, . . . , TV. 
where the weights are computed as 



(«) _ 1 j 1 ( Vk+l -Vk-Ath- Y.7=k^ll -^^'^"^ ^ I (n) 
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with Cfc+i being a normalizing constant. This algorithm of sampling and weighting is usually accompanied 
by sampling- importance-resampling (SIR), which basically replaces the samples with boot-strap samples if 
relative importance starts to favor a select-few particles. For instance if the number of important particles 
falls below some critical level we should perform SIR, 

1/ ^(w["^)^ < jyi?, then perform SIR 

n 

where 77 G [0, 1] is the required fraction of important particles. SIR involves the following: (1) for each 
n we sample a random variable Vn from < i'dl^L+i)^ mik+iO 1 with probabilities i (^j," ( ', (2) after 
i?-many V^s have been obtained, set 

{&^::Ue+i,X:^tli+i) = Vn, c.^-^ = 1/R tor each n. 

Basically, SIR is a variance reduction technique that will prevent the particle filter from relying on relatively 
few samples. Algorithm [2] summarizes the steps taken to recursively to compute the particle filter upon the 
arrival of the (fe -I- 1)*'' observation. 

Algorithm 2 Recursive Particle Filter Algorithm at (fc + 1)*'' observation 
for n — 1 ^i' R do 
for ^ = — > TO — 1 do 

Generate (0^1^,^,, X:^^:},^,) given (9^1+,, l;;!"^,) 
end for 
Compute unnormahzed weight given yk+i - yk, (©^(fe+i)' ^m(fc+i))' ^^'^ ^k 

"^+^ = '^^^1-2 [ TM 

end for 

set Cfc+i = E„ Ji^l-^ 

set ujk+i = Wfc+i/cfe+i 

ifl/E„(4"+\)'<^^then 

replace < (0m(fc+i)'"^m(fc+i)) f ^^*^ ^^ {^fc+i}n=i"'^6ighted bootstrap sample of size R 

replace {4+ilLi with (1,1,..., l)/i? 
end if 

Convergence of the particle filter to the optimal filter as i? ^ oo can be slow, particularly if X is a 
multi-dimensional OU process. Rao-Blackwellization can speed things up dramatically if a large portion of 
the hidden state's dimensionality is conditionally linear, and if the function h is linear. However, if h is 
nonlinear but e is small, then the averaged-filter will be close to optimal, will be faster to compute, and its 
particle filter will converge faster. 

In fast mean-reverting regimes, we take weights <D given by 

-,(n) _^g^ J 1 / Vk+i -Vk-Ath- E^lfat+l Qf"^ \ l.^(") 



Ck+i I 2 \ y/At 
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where Ck+i is a normalizing constant. The averaged fiher is 

R. 



E 



5(0(4-)) |n°fc = yi:fe] « E 5(0i"2) • 4"^ for fc = 0, 1, 2, . . . , iV. 



The fiher computed with the w's is appeahng because it does not require samples of X'^. Thus, fewer samples 
are required and the filter will converge at a faster rate with R. 



4.3. Filtering Error as a Function of e. For any e > 0, we define IE'^[-] to be the expectation operator 

■Y 

k 



associated with the P^-measure. Then, we consider two estimates of O(ifc) among all the F^ -measurable 



functions: an optimal and an averaged. 



©r* = argmM7r^(sj) = arg min^ WlQ(tk)^g 



9°"^ = argmax7rfc(sj). 



For a given estimator Q\ , the 0-1 error is defined and approximated as 



^fc(sO«E4"^-l{e<"2=sj 



error, = E^^^^^^^^g^j « - ^ l{e(t.)#e( >} ^^^ ^ la^^ge- 

k=l 

From the theory presented in the Sections leading up to now, we should find that the 0-1 error of the averaged 
estimate converges to the error of the optimal estimate as e \ 0. The estimators are computed with the 
following approximated tt's: 

R 

,(") 
J. 

n=l 

and 
R 

'^fe(s^)~E'^i"^-l{e<"^=..}- 

n=l " 

The figures in this section and the next are generated with the following parameterization of the problem: 
At = .01, iV = 100,000 5= [-3.333,3.333], a = 10, /3 = 5 

~^ ^p ) X(0)^;7hl,l], h{x)^lQ-x. 



For R — 100 and m = 5 (so that At = .0005), Figure 4.1 shows how the 0-1 error of the average and optimal 



converge. The figure was computed using the Rao-Blackwellized filter described in Appendix IBJ rather than 
the particle filter of Algorithm [2] From the figure, we can clearly see that there is convergence of the averaged 
filter's error to the optimal filter's error as a e decreases, 
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optimal filter VS averaged 



optimal 

averaged 



1 

1e-04 



— I — 
1e-03 



1 

1e-02 

epsilon 



1 

1e+00 



1e-01 



Fig. 4.1. The error of the averaged filter 0^"* becomes closer to that of the optimal filter Q^ as e \ 0. Also, the 
error of the optimal filter decreases as e \ 0, indicating that SNR is increasing. 



< E' fl^ 



e*,5^ern 



"{^t^^Kn 



e\,0 



0. 



It should also be noted that we can easily identify the signal-to-noise ratio (SNR) for this problem because 
it is linear. In fact, in Figure [4?T] we see a decrease in the optimal filter's error as e decreases, indicating that 
the variance of the estimator decreases as well. 

It is interesting to compare similarities and differences between the filter in [TU] (described in Section 



2.5.21 and the one in this paper. The two filters are quite different because they have different scalings, and 
because the filter in |I9| is derived from a strong-sense limit whereas the one in this paper is derived under 
the weak-topology. Also, averaging over h[x) makes the filter in this paper simpler and easier to implement 
than the one in [12] where no such averaging occurs. However, these filters are similar because both of their 
performances are contingent on the goodness of the asymptotic approximation. Indeed, the ability of the 
reduced-dimension filter to improve as the rate of mean-reversion increases is seen in the simulations of both 



papers. In fact, [19] has a plot that is similar to Figure 4.1 



Our conclusion from this section is that the averaged-filter can perform well, but its closeness to the 
optimal filter will depend on e. Whether or not optimality is essential will depend on the problem and the 
user's specific needs; it might be merely an extra l%-2% in error that is under scrutiny. 



4.4. Search for the Optimal Ai. In this section we run simulations in an attempt to find the optimal 
rate at which observations should be sampled. The simulation and parameters are the same as they were 



in Section 4.3 except we will vary Ai. The emphasis will be on the choice of Ai which minimizes filtering 
error, 
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Fig. 4.2. A realization where sampling too slow At = .01 or too fast At = .0001 will not be optimal. The black 
line is the observations AYt, the red line is the true state O, the green line is the filter obtain by sampling observations 
very slow, and the blue line is the filter obtained by sampling too fast. Notice how the green line sometimes tracks 
noise. 






mif(E^lefc)^e» 



(4.3) 



Simply stated, sampling too fast will eliminate any hypothesis regarding a fast mean-reverting regime, while 
sampling too slow will cause the filter's memory to deteriorate and the performance will be poor. Therefore, 
given all the other parameters, the user must choose the value of At that gives the averaged filter a good 
chance to perform well. 

The first thing to notice is the potential for the filter to track noise. In Figure |4.2[ two filters are 
plotted alongside the true state and the observations. One of the filters takes At = .01 and returns an 
estimate that misses the transition in 9. The other filter takes At = .0001, which is entirely too fast for the 
fast-mean-reverting hypothesis and results in the tracking of noise. 

Ideally, the averaged filter will be implemented with a priori knowledge of the optimal sampling rate, 
but analytical expressions for such optima are not known. Instead, the simulation is run several times with 
different At's, and the errors are compared. Figure |4.3| shows the optimal At to be around .002 for this 
model (At = mAt = m x .0001 = .002). For each m in Figure 4.3 the averaged filter is computed with 
i? = 20 X log2(m.) as the number of particles. 

The conclusion is that At cannot be too small; otherwise, the fast filter becomes more erroneous. 
Intuitively, it makes sense if one understands that some 'coarse-graining' is required to insure that the fast- 
averaging of the model's integrals will occur between observation times. For this reason, the average filter 

does not apply for filters based on a continuum of observations. In practice, the user must consider the 
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eps= 0.01 




Fig. 4.3. The 0-1 error for e = .01 and varying At = mAf = m x 10 *. The solid line is the error from the 
simulation, and the dotted line is a spline included so that the low-point is clearer. 

model's parameters and make a decision on At's value. 

5. Conclusion. We have considered a filtering problem for a hidden Markov model where one of the 
hidden states is fast mean-reverting with a shifting mean given by a Markov chain that is also a hidden process 
that models regime change. We have derived an averaged filter of low complexity for tracking directly the 
regime changes. We have given examples of where this model may be useful and we have also provided some 
related examples to show that subtle changes to the model will change the asymptotic behavior of the filter. 
We have also given a rigorous treatment and proof to show that the averaged filter is asymptotically optimal. 

We have also presented the results of some numerical simulations to show how the averaged filter performs 
as the rate of mean reversion increases. These simulations indicate that the averaged filter is a powerful 
numerical method if used correctly. Specifically, the filter's performance depends in large part on parameter 
values for the rate of mean-reversion and the rate of observations. 

In the future, possible research on nonlinear filters for HMMs with fast mean-reverting states includes 
robustness analysis, parameter estimation and the EM algorithm, and further application of the algorithms 
to financial data. 

Appendix A. Tightness of Marginal Measures and Weak Convergence to a Unique Limit. 



This section contains the lemmas and theorems necessary to prove (3.4). In doing so, it is useful to introduce 



a marginal probability space {U,Tt,V) for the non-Markov process (6^(i), V^lt)), where 

n = D([0,T];5)xC([0,T];M) 



and we define the measures 
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P%A)=F{{e'i-),V'{-))eA) 

for all e > and A G ^t- It can be shown that the process 8^(-) is tight in the space D{[Q,T];S) and V'^{-) 
is tight in the space C([0,r];IR), which therefore makes the family (P'^)(:>o tight in Ct. Furthermore, for any 
bounded and differentiable g ; 5 x M — > M, we can define the following continuous functional for any path 



ct>g{t)=g{9{t),v{t))~g{e{0),v{0))- [ (Q + hg^r)^]g{0{r),v{r))dr 



'dv 



(A.l) 



Under the measure P*^ the expectation of Mg (t) is defined as 



for which we will show that for any t, s € [0, T] with t > s, 



dP' 



E' 



(j>g{t) ~ (l>g{s) 



^s 







as e \ 



in probability, which is sufficient to conclude that for any weakly convergent subsequence of (P'^)e>o with 
limit P, we have that (pg is a P-Martingale. Therefore, from the tightness of the marginal measures (P')e>o 
it follows that 



where {&{■),¥{■)) uniquely solves the P-Martingale problem associated with (jjg in (A.l), and is a Markov 
process with generator Q + hs^ -^ . 

To summarize, the goal of this section is to prove (3.4) by showing that the marginal measures (P')e>o 



are tight in fl and that every convergent subsequence must converge to the measure of a process (O(-), V{-)) 
which is the unique solution to a specific Martingale problem. This is a well known method of proving weak 
convergence and is presented in [28 , for example. 

A.l. Tightness. General result regarding tightness of measures and weak convergence can be found 
in the book by Ethier and Kurtz [9 , (chapter 3). In fact the proof of the tightness of 0^(-) that is shown in 
this section is a specific case of lemma 6.1 on page 122 of their book. 

Tightness of a family of measures means that for all 6 > Q there exists a compact set Kg C fj such that 

supP'iKI) <S 



It turns out that because of the bound in (2.3 1, the family of measures is tight in ft 



Lemma A.l. Assuming that the initial condition Vq is almost-surely bounded, the bound in 112. 3y insures 
that (P'^)(:>o is tight in fl. 
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Proof of Lemma A.l : First, let's prove that V'^{-) is tight in C([0, T]; M). Since there exists a positive 
constant k < oo such that P(|Vb| < k) = 1 and because h is bounded, the function V'^{t) is uniformly 
bounded, 

\V'{t)\<K + T\\h\\^ Vte [0,T],Ve>0 

Furthermore, V'^{t) is uniformly equicontinuous with modulus of continuity ||/i||ooi 

\V'{t) - V'is)\ < \\h\Ut - s\ yt,s e [0,T],Ve > 

Therefore, by the Ascoli theorem, any subsequence of V'^{-) has a further subsequence that converges uni- 
formly, and therefore the support of all measures P' on V^ is compact. Therefore, for any 5 > there is a 
set Ks{V) such that 

f"'{Ks{V))> 1-6/2 Ve>0. 

Next, to prove compactness of 6^(-), consider any / e D{[0,T],S) and define the switching times 
K(/)}r=o for fc = 1,2,3,... 

r inf {s e(rfc_i(/),T]:/(s)^/(s-) or s = T} ifrfc_i(/)<T 
''^■l' \^ T otherwise 

with ro(/) = 0. For any compact set [/ C M, let the set A{U^ 5) be defined as 

A{U, 6)^{f: /(s) e U for ah s < T, Tu{f) - Tk-iif) > Syk with r^if) < T} 
The closure of A{U, S) is compact: 

Given any sequence /„ € A{U,S), by a diagonal argument there exists a subsequence /„j, and limits i^ 
and flfc such that 



and 



as ^ /* oo. Therefore, 



Tkifn,)^tke [0,T] 



fniiTkifm)) -^ ak eU 



fntis) -^ Qk, for S e [ife,ife+l) 



which proves that A{U,S) is sequentially compact, and since D{[0,t];S) is equipped with the point-wise 
metric, the subset is therefore compact. 

For the Markov process &'^{-), take U s.t. {si, ...., sm} C U and recall that 

¥%A{U, Sy) < 1 - e-^^ <j35 Ve > 
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which shows that &'^{-) is compact in D([0, T];] 



Finally, tightness of (P')c>o follows by considering the compact set Ks{V) x A{U,S/2/3), 
r (^{Ks{V) X AiJlJJWiy) < P' {Ks{Vy) + r (^A{U, S/2pf'^ < S/2 + 6/2 = 6 



A. 2. Convergence of the Martingale Problem. In this section it will be shown by theorem A. 3 
that for any t, s G [0, T] with t > s, 

in probability. 

To prove that the family of marginal measures will converge to the unique solution of the Martingale 
problem, the following lemma will be necessary: 

Lemma A. 2. For any positive At > and any function g such that II3II00+ hr5 !i C, the expectation 
of the following terms converge to a term of o{At) as e \ 



lim 

e\,0 



E 



At 



{Q{X%s))~Q)g{e'{s),V'{s))ds 



J'n 



< 2/3^ At^ 



lim 



E 



{h{X'{s)) - /ie»(s)) 9^9iQ'{s), V%s))ds 



J'o 



< 2/3^ At^ 



in probability. 



Proof of lemma A. 2: Start by proving the first statement. By iteratively conditioning the cr-algebras, the 
expectation can be separated into two parts. 



E 



At 



{Q{X%s))-Q)gie%s),V'{s))ds 



^0 



< 



E 



At 



{Q{X^-{s))-Q)g{e'{s),V%s))ds 



Ton{e'is) = Q'{0) Vse[0,At]} 



cP(e'^(s) EE 6^(0) forse [0,Ai]) 



E 



At 



{Q{X'{s))-Q)g{e'{s),V'{s))ds 



J"o n {max \Q'{s) - 9^(0)1 > 0} 



X (1 - P(e"(s) = 9^(0) for s e [0, At])) 
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Now, it obvious that P(e'(s) = e'(0) for s e [0, Ai]) < 1, and from the bound in p3| it foUows that 



1 - P(e"=(s) = e^(0) for s e [0, At]) < 1 - 



-l3At 



On intervals where 8'^(t) is constant, we have 

X^it) =, 9^(0) + e-*/^(x - 9^(0)) +nU^(i~ e-''/^)\ 
and on these intervals the densities of X'^{t) given X{Q) = x can be written as lJ.t/e{y\x) € ]]J*^><*^ as 



IJ-t/tiylx) 



V7r(l - e-2*/E) 



exp{- 



l_e-2t/E 



} 



exp{- 







l_e-2t/, / 



From here, there is the following upper-bound, 

rAt 



E 







{Q{X'{s))-Q)g{Q'{s),V\s))ds 



J"n 



< 



E 



At 



{Q{X^{.s))-Q)g{e^{s),V^{s))d.i 



J-on{9'^(s) = 9^(0) Vse[0,At]} 



-2/3Af(l - e-P'^') 



At 



^i,/MX'mQ{y)dy - Q .9(9^(0), F^(o))d, 



■ 2/3At(l - e 



'/3AtN 



At 



//,/,(y|X'(0))g(2/)dy-g Us 



:=7(Af,£) 



+ 2(3At{l - e-P^') + ||.g||oo sup \\Q{x) - Q\\ e^ 



<2/32At2 



=C' 



where < a < 1. It is clear the 7(At, e) -> as e \ for all Ai > in probability. Therefore, we have 



E 



At 



{fi{X\s))~Q)g{Q\s),V\s))d, 



^0 



< At-/{At, e) + 2/3^ Ai^ + C'e" ^ 2/3'' At 



which proves the first statement of the lemma. 

Repeating these steps with Q{X^{s)) replaced by h{X'^{s)) and Q replace by JiQei^g-j, the second statement 
in the lemma is proven provided that there is a constant such that 

_9 
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II5II 



-9 



<C. 



With lemma A. 2, the key calculation for proving weak convergence of (O'^(-), V'^{-)) can be shown. Namely, 
from the time-homogeneity and Markov property of the solution to the Martingale problem associated with 
4>g , it is sufficient to show that the expectation under P*^ of the function (j)g (t) conditioned on J^) converges 
to zeros as e \ 0. 

Theorem A. 3. For any function g {9, v) with II3II00+ h^5 — ^' condition (2.3) and the boundedness 
of h insure that for any t,s ^ [0, T] with t > s, 

as e \ in probability. 



Proof of Theorem A. 3 : Without loss of generality we take s = and any t G [0, T]. For a fixed step-size 
At > 0, we consider the following discretization of the interval [0, t], 

< At < 2Ai <...•••< NM = t 

For any n < N, let i„ = uAt. We then have a collapsing sum 

E^'[^g{t)\To] 

= E [5(e^(i), V^it)) - 5(6^(0), F^(0))|J-o] 



-E 



'Q + hs^,^^]g{e'{s),V'is))ds 



^n 



= E 



'N-l 



J2 ff(©'(tn+i), V'itn+i)) ~ gie^t^), V'it„))\^o 



n=0 



-E 



■^-1 /.t„+i 



E 

n=0 



Q + he^^s)^]9{e'{s),V'{s))ds 



^0 



E 



■AT-l 



J2 E[5(e^(i„+i), y^(Wi))|^tJ - gie%t„), y^(i„)) 



^0 



(*) 



-E 



e/ e 



Q + he-is) g^]gie'{s),V%.s)) 
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^u 



ds 



^0 



The summand in (**) can be shown to be equal to the summand m (*) plus a correction term, 



/; 



E 



Q + he^is)Q^]g{e'is),V^s)) 



^u 



ds 



E 



d 



d 



Q + he'is)j^ - Q{X^s)) - h{X^-{s))— g{Q^{s),V^{s)) 



'dv 



dv 



^tr. 



ds 



=e„(c,At)) 



i„+i 



E 



QiX^is))-h{X%s))-]g{e^{s),V%s)) 



T. 



ds 



= E[g{e'{tn+i),V'{tn+i))\J'tJ - g{e'{tn),V'{tn)) + e„(e, At) 



By lemma A. 2 the correction term e„(e, Ai) can be controlled so that the limit as e \ is o(Ai) in 
probability. Therefore, 



AT-l 



N~l 



E" 



,WI-^o] < Yl K{e,At)\^ ^o(Ai) = 0(At) as e \ 



and the entire sum converges to something of order 0(At) as e \, in probability, which proves the theorem 
since At is arbitrarily small. 



Theorem A. 3 confirms that any convergent sequence in the family of tight measure (P'^)e>o must have a 
limit P which is the measure for which paths in the space fl uniquely solve the Martingale problem associated 

stands. 



with (j)g . Therefore, this proves (3.4) and the reduced filter of theorem 



3.1 



Appendix B. The Rao-Blackwellized Filter for Linear Observations, (h(x) = h • x). The Rao- 
Blackwellized filter for a general class of target tracking problems is covered in [141 [27]. The Rao-Blackwellized 
filter will be specific to the model in question; this appendix covers the case for the OU model of this paper. 

Using auto-regressive coefhcient a = e~ ''^, we have the following recursion for Algorithm [ij 



X, 



ik+l)Tn - «^('fc+l)m-l + (1 - «)0(fe+l)m + V ^ ^{k+1) 



l-a2 



' 1 _ a2 
a^^ffc+i)m-2 + (1 - a)0(fe+i)m + (1 - a)a0(fc+i)m-i + \/ o (W(fc+i)™ -l- aW(k+i)m-i) 
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= a^Xl„, + (1 - a) ^ a"-'efc„+, + 



e.=i 



a'^-'Wk.n+t, 



where We ^ iidN{0, 1). Using this recursive formula, the values of X^ and Q that occur at the m-many 
times between observations can be stored in a vectors X^j^-^ — {Xff.^^s^,Xff,_^-^^s_^,...,Xl^^-^^)'^, and 

&k+i = (0(fe+i)mj 0(/£+i)m-ij • ■ • J ©fcm+i)"^ whcre 'T' denotes matrix/vector transpose. There matrix/vector 
form of the recursion is 

i^+i = AXl + BQk+1 + RWk+i 

where Wk+i = (W(fc+i)„, yV(fc+i),„_i, • . • , Wkm+iV , and the matrices are 



A 



/ a" 
a™-i 



V a 



and 







0/ 



R 



\ 








( 1 


a 




a" 


-2 


a"-i 













1 




a™-3 


a™-2 


B = 


(1- 


-a) 




























1 


a 


1 








lo 










1 




^1 


a 




a™- 


-2 


a™- 


1 \ 












1 






a™- 

1 


-3 


a"- 
a 


-2 






l-a2 




2 






^0 












1 


/ 







Observations are given by 



^k+i - Yk — H^(k+i 



l\Zk 



with AZfc -- ndN{0, At), and H = At ■ {h,h,h, . . . ,h). 

The idea of Rao-Blackwellization is to generate particles of 0, and then exploit the remaining linearity. 

For each sample-path of Qj^l^, there corresponds a weight ujj^ , and a Kalman filter (X^. , S^) where 



ci(") 






,(») 



X, 



? 



The covariance matrix S^ evolves independently of Y , and it also evolves independently of 8'"-* for the 
particular model that were implementing. Therefore, S^ does not depend on the data or the samples. 



,(") 



However, Xj, does depend on the data and samples. Algorithm |3| lays out the steps for programming 
such a filter. 



.g. 



29 



Algorithm 3 Rao-Blackwellized Particle Filter Algorithm at (fc + 1) observation 



Sfe+i|fe = ^SfeA-^ + R^ R //compute the covariance of X^^-^ given Tf 



k- 



Sk+1 = -ffSfe+i|feif + Af //compute covariance of Y^^^ — Y^ given Tk 

Kk — 'Sk+i\kH'^ / Sk+i //compute Kalman filter gain operator, 

S/c+i — {I — KkH)Yik+i\k //compute the covariance of A^^j^ given Fk+i- 

for n = 1 ^ i? do 

for i' = ^ TO — 1 do 

'^mk+t+i given 6„^_|_^ //sequentially update the particle 

end for 

^k+i\k = ^^k + -S0fc+i //predict the state at the next time step. 

^^' — "^'^ 2 ~^ — y fc+i ifc I . ^(n) //compute unnormalized weight 

-£,(n) ^«'(") / ^^''^"' \ 

^fe+i = -'^fe+i|fc + -^fe ( Vk+i -yk- ^^fc+i|fe j // update the filter. 

end for 

Cfe+i = X^n'^fc+i //compute normalization weight 

Wfe+i = Wfe+i/cfe+i //normalize 

////////////////////////////////////////////////////////////////////////////////////////// 
// perform SIR if necessary 

ifl/E„(4+i)'<^^then 

replace < ^mlk+D^'^k+i ( with an {cj{;,"/j^}^^]^ -weighted bootstrap sample of size R 

replace {4+i}n=i with (1,1,..., l)/R 
end if 
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